Client Report - Project 0: Introduction

Course DS 250

Author

Ryan Lee

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)
Show the code
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
from palmerpenguins import load_penguins
df = load_penguins()

QUESTION|TASK 1

Include the tables created from PY4DS: CH2 Data Visualization used to create the above chart

This is a chart that shows just the first and last five entries of the penguins.

Show the code
penguins = load_penguins()
penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007
... ... ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 19.8 207.0 4000.0 male 2009
340 Chinstrap Dream 43.5 18.1 202.0 3400.0 female 2009
341 Chinstrap Dream 49.6 18.2 193.0 3775.0 male 2009
342 Chinstrap Dream 50.8 19.0 210.0 4100.0 male 2009
343 Chinstrap Dream 50.2 18.7 198.0 3775.0 female 2009

344 rows × 8 columns


With the tag “.info()”, we can see a summary of the chart. It makes the info easier to read at a glance.

Show the code
penguins.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
 7   year               344 non-null    int64  
dtypes: float64(4), int64(1), object(3)
memory usage: 21.6+ KB

QUESTION|TASK 2

Recreate the example charts from PY4DS: CH2 Data Visualization of the textbook.

Now this is us taking the data from the penguins and compairing Body Mass with Flipper Length. As you see, the info is changed from a chart to a dotted graph.

Show the code
ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g")
) + geom_point()

When we add color on the graph, we can see that there are three different types of Penguin Species that we charted. The Adelie tend to have less Body Mass along side with the Chinstrap. The Gentoo have more Body mass and Larger flippers.

Show the code
ggplot(
  data = penguins, 
  mapping = aes(x = "flipper_length_mm", y = "body_mass_g")
  ) + geom_point(mapping = aes(color = "species")) 

When we smooth out the averages, we can see the average of each point of each species. While it looked like Green was farther to the left of Red, we can see that the average line of Red is more than the average line of green.

Show the code
ggplot(
  data=penguins,
  mapping=aes(x="flipper_length_mm", y="body_mass_g", color="species"),
  ) + geom_point(
  ) + geom_smooth(method="lm")

With finding the average of the Species, we can see the line that helps us understand the data.

Show the code
ggplot(
  data=penguins, 
  mapping=aes(x="flipper_length_mm", y="body_mass_g")
  ) + geom_point(mapping=aes(color="species")
  ) + geom_smooth(method="lm")

This one was to practice changing the species dots into different symbols. Can help people who may be color blind or differentiate the different points.

Show the code
ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g")
  ) + geom_point(mapping=aes(color="species", shape="species")
  ) + geom_smooth(method="lm")